Overview
Brought to you by YData
Dataset statistics
| Number of variables | 34 |
|---|---|
| Number of observations | 19717 |
| Missing cells | 221379 |
| Missing cells (%) | 33.0% |
| Duplicate rows | 157 |
| Duplicate rows (%) | 0.8% |
| Total size in memory | 5.1 MiB |
| Average record size in memory | 272.0 B |
Variable types
| Categorical | 16 |
|---|---|
| Text | 18 |
Reproduction
| Analysis started | 2024-11-05 15:57:55.780002 |
|---|---|
| Analysis finished | 2024-11-05 15:58:03.778063 |
| Duration | 8 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
What is your age (# years)?
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
| 25-29 | |
|---|---|
| 22-24 | |
| 30-34 | |
| 18-21 | |
| 35-39 | |
| Other values (6) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.9898565 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 22-24 |
|---|---|
| 2nd row | 40-44 |
| 3rd row | 55-59 |
| 4th row | 40-44 |
| 5th row | 22-24 |
Common Values
| Value | Count | Frequency (%) |
| 25-29 | 4458 | |
| 22-24 | 3610 | |
| 30-34 | 3120 | |
| 18-21 | 2502 | |
| 35-39 | 2087 | |
| 40-44 | 1439 | 7.3% |
| 45-49 | 949 | 4.8% |
| 50-54 | 692 | 3.5% |
| 55-59 | 422 | 2.1% |
| 60-69 | 338 | 1.7% |
Length
| Value | Count | Frequency (%) |
| 25-29 | 4458 | |
| 22-24 | 3610 | |
| 30-34 | 3120 | |
| 18-21 | 2502 | |
| 35-39 | 2087 | |
| 40-44 | 1439 | 7.3% |
| 45-49 | 949 | 4.8% |
| 50-54 | 692 | 3.5% |
| 55-59 | 422 | 2.1% |
| 60-69 | 338 | 1.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
What is your gender?
Categorical
Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
| Male | |
|---|---|
| Female | |
| Prefer not to say | 318 |
| Prefer to self-describe | 49 |
Length
| Max length | 23 |
|---|---|
| Median length | 4 |
| Mean length | 4.5826951 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Female |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 16138 | |
| Female | 3212 | 16.3% |
| Prefer not to say | 318 | 1.6% |
| Prefer to self-describe | 49 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 16138 | |
| female | 3212 | 15.5% |
| prefer | 367 | 1.8% |
| to | 367 | 1.8% |
| not | 318 | 1.5% |
| say | 318 | 1.5% |
| self-describe | 49 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
| Distinct | 59 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 52 |
|---|---|
| Median length | 28 |
| Mean length | 10.232642 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | France |
|---|---|
| 2nd row | India |
| 3rd row | Germany |
| 4th row | Australia |
| 5th row | India |
| Value | Count | Frequency (%) |
| india | 4786 | |
| of | 3736 | 11.2% |
| united | 3567 | 10.6% |
| states | 3085 | 9.2% |
| america | 3085 | 9.2% |
| other | 1054 | 3.1% |
| brazil | 728 | 2.2% |
| japan | 673 | 2.0% |
| russia | 626 | 1.9% |
| china | 574 | 1.7% |
| Other values (63) | 11583 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 394 |
| Missing (%) | 2.0% |
| Memory size | 154.2 KiB |
| Master’s degree | |
|---|---|
| Bachelor’s degree | |
| Doctoral degree | |
| Some college/university study without earning a bachelor’s degree | 837 |
| Professional degree | 611 |
| Other values (2) | 566 |
Length
| Max length | 65 |
|---|---|
| Median length | 15 |
| Mean length | 18.286446 |
| Min length | 15 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Master’s degree |
|---|---|
| 2nd row | Professional degree |
| 3rd row | Professional degree |
| 4th row | Master’s degree |
| 5th row | Bachelor’s degree |
Common Values
| Value | Count | Frequency (%) |
| Master’s degree | 8549 | |
| Bachelor’s degree | 5993 | |
| Doctoral degree | 2767 | 14.0% |
| Some college/university study without earning a bachelor’s degree | 837 | 4.2% |
| Professional degree | 611 | 3.1% |
| I prefer not to answer | 333 | 1.7% |
| No formal education past high school | 233 | 1.2% |
| (Missing) | 394 | 2.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| degree | 18757 | |
| master’s | 8549 | |
| bachelor’s | 6830 | 15.0% |
| doctoral | 2767 | 6.1% |
| some | 837 | 1.8% |
| college/university | 837 | 1.8% |
| study | 837 | 1.8% |
| without | 837 | 1.8% |
| earning | 837 | 1.8% |
| a | 837 | 1.8% |
| Other values (12) | 3674 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Select the title most similar to your current role (or most recent title if retired)
Categorical
Missing 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 610 |
| Missing (%) | 3.1% |
| Memory size | 154.2 KiB |
| Data Scientist | |
|---|---|
| Student | |
| Software Engineer | |
| Other | |
| Data Analyst | |
| Other values (7) |
Length
| Max length | 23 |
|---|---|
| Median length | 18 |
| Mean length | 12.61276 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Software Engineer |
|---|---|
| 2nd row | Software Engineer |
| 3rd row | Other |
| 4th row | Other |
| 5th row | Data Scientist |
Common Values
| Value | Count | Frequency (%) |
| Data Scientist | 4085 | |
| Student | 4014 | |
| Software Engineer | 2705 | |
| Other | 1690 | |
| Data Analyst | 1598 | 8.1% |
| Research Scientist | 1470 | 7.5% |
| Not employed | 942 | 4.8% |
| Business Analyst | 778 | 3.9% |
| Product/Project Manager | 723 | 3.7% |
| Data Engineer | 624 | 3.2% |
| Other values (2) | 478 | 2.4% |
| (Missing) | 610 | 3.1% |
Length
| Value | Count | Frequency (%) |
| data | 6307 | |
| scientist | 5555 | |
| student | 4014 | |
| engineer | 3485 | |
| software | 2705 | |
| analyst | 2376 | 7.4% |
| other | 1690 | 5.3% |
| research | 1470 | 4.6% |
| not | 942 | 2.9% |
| employed | 942 | 2.9% |
| Other values (5) | 2702 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
What is the size of the company where you are employed?
Categorical
Missing 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5715 |
| Missing (%) | 29.0% |
| Memory size | 154.2 KiB |
| 0-49 employees | |
|---|---|
| > 10,000 employees | |
| 1000-9,999 employees | |
| 50-249 employees | |
| 250-999 employees |
Length
| Max length | 20 |
|---|---|
| Median length | 18 |
| Mean length | 16.76282 |
| Min length | 14 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1000-9,999 employees |
|---|---|
| 2nd row | > 10,000 employees |
| 3rd row | > 10,000 employees |
| 4th row | 0-49 employees |
| 5th row | 0-49 employees |
Common Values
| Value | Count | Frequency (%) |
| 0-49 employees | 4025 | |
| > 10,000 employees | 3160 | |
| 1000-9,999 employees | 2641 | |
| 50-249 employees | 2329 | |
| 250-999 employees | 1847 | 9.4% |
| (Missing) | 5715 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| employees | 14002 | |
| 0-49 | 4025 | 12.9% |
| 3160 | 10.1% | |
| 10,000 | 3160 | 10.1% |
| 1000-9,999 | 2641 | 8.5% |
| 50-249 | 2329 | 7.5% |
| 250-999 | 1847 | 5.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 6094 |
| Missing (%) | 30.9% |
| Memory size | 154.2 KiB |
| 20+ | |
|---|---|
| 1-2 | |
| 3-4 | |
| 0 | |
| 5-9 | |
| Other values (2) |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 2.9286501 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 20+ |
| 3rd row | 20+ |
| 4th row | 0 |
| 5th row | 3-4 |
Common Values
| Value | Count | Frequency (%) |
| 20+ | 3178 | |
| 1-2 | 3005 | |
| 3-4 | 2319 | 11.8% |
| 0 | 1880 | 9.5% |
| 5-9 | 1847 | 9.4% |
| 10-14 | 967 | 4.9% |
| 15-19 | 427 | 2.2% |
| (Missing) | 6094 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 20 | 3178 | |
| 1-2 | 3005 | |
| 3-4 | 2319 | |
| 0 | 1880 | |
| 5-9 | 1847 | |
| 10-14 | 967 | 7.1% |
| 15-19 | 427 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Does your current employer incorporate machine learning methods into their business?
Categorical
Missing 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 6490 |
| Missing (%) | 32.9% |
| Memory size | 154.2 KiB |
| We are exploring ML methods (and may one day put a model into production) | |
|---|---|
| We recently started using ML methods (i.e., models in production for less than 2 years) | |
| We have well established ML methods (i.e., models in production for more than 2 years) | |
| No (we do not use ML methods) | |
| We use ML methods for generating insights (but do not put working models into production) |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 66.814017 |
| Min length | 13 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | I do not know |
|---|---|
| 2nd row | We have well established ML methods (i.e., models in production for more than 2 years) |
| 3rd row | I do not know |
| 4th row | No (we do not use ML methods) |
| 5th row | We have well established ML methods (i.e., models in production for more than 2 years) |
Common Values
| Value | Count | Frequency (%) |
| We are exploring ML methods (and may one day put a model into production) | 2812 | |
| We recently started using ML methods (i.e., models in production for less than 2 years) | 2731 | |
| We have well established ML methods (i.e., models in production for more than 2 years) | 2528 | 12.8% |
| No (we do not use ML methods) | 2415 | 12.2% |
| We use ML methods for generating insights (but do not put working models into production) | 1550 | 7.9% |
| I do not know | 1191 | 6.0% |
| (Missing) | 6490 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| we | 12036 | 7.4% |
| ml | 12036 | 7.4% |
| methods | 12036 | 7.4% |
| production | 9621 | 5.9% |
| for | 6809 | 4.2% |
| models | 6809 | 4.2% |
| years | 5259 | 3.2% |
| 2 | 5259 | 3.2% |
| than | 5259 | 3.2% |
| i.e | 5259 | 3.2% |
| Other values (29) | 82789 |
Most occurring characters
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
What is your current yearly compensation (approximate $USD)?
Categorical
Missing 
| Distinct | 25 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 7220 |
| Missing (%) | 36.6% |
| Memory size | 154.2 KiB |
| $0-999 | |
|---|---|
| 10,000-14,999 | |
| 100,000-124,999 | 750 |
| 30,000-39,999 | 728 |
| 40,000-49,999 | 719 |
| Other values (20) |
Length
| Max length | 15 |
|---|---|
| Median length | 13 |
| Mean length | 12.04361 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 30,000-39,999 |
|---|---|
| 2nd row | 5,000-7,499 |
| 3rd row | 250,000-299,999 |
| 4th row | 4,000-4,999 |
| 5th row | 60,000-69,999 |
Common Values
| Value | Count | Frequency (%) |
| $0-999 | 1513 | 7.7% |
| 10,000-14,999 | 833 | 4.2% |
| 100,000-124,999 | 750 | 3.8% |
| 30,000-39,999 | 728 | 3.7% |
| 40,000-49,999 | 719 | 3.6% |
| 50,000-59,999 | 704 | 3.6% |
| 1,000-1,999 | 599 | 3.0% |
| 60,000-69,999 | 576 | 2.9% |
| 5,000-7,499 | 536 | 2.7% |
| 15,000-19,999 | 529 | 2.7% |
| Other values (15) | 5010 | |
| (Missing) | 7220 |
Length
| Value | Count | Frequency (%) |
| 0-999 | 1513 | 12.0% |
| 10,000-14,999 | 833 | 6.6% |
| 100,000-124,999 | 750 | 6.0% |
| 30,000-39,999 | 728 | 5.8% |
| 40,000-49,999 | 719 | 5.7% |
| 50,000-59,999 | 704 | 5.6% |
| 1,000-1,999 | 599 | 4.8% |
| 60,000-69,999 | 576 | 4.6% |
| 5,000-7,499 | 536 | 4.3% |
| 15,000-19,999 | 529 | 4.2% |
| Other values (16) | 5093 |
Most occurring characters
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Missing 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 7467 |
| Missing (%) | 37.9% |
| Memory size | 154.2 KiB |
| $0 (USD) | |
|---|---|
| $100-$999 | |
| $1000-$9,999 | |
| $1-$99 | |
| $10,000-$99,999 |
Length
| Max length | 17 |
|---|---|
| Median length | 15 |
| Mean length | 10.101388 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | $0 (USD) |
|---|---|
| 2nd row | > $100,000 ($USD) |
| 3rd row | $10,000-$99,999 |
| 4th row | $0 (USD) |
| 5th row | $10,000-$99,999 |
Common Values
| Value | Count | Frequency (%) |
| $0 (USD) | 4038 | |
| $100-$999 | 2335 | 11.8% |
| $1000-$9,999 | 2123 | 10.8% |
| $1-$99 | 1485 | 7.5% |
| $10,000-$99,999 | 1268 | 6.4% |
| > $100,000 ($USD) | 1001 | 5.1% |
| (Missing) | 7467 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| usd | 5039 | |
| 0 | 4038 | |
| 100-$999 | 2335 | |
| 1000-$9,999 | 2123 | |
| 1-$99 | 1485 | 8.1% |
| 10,000-$99,999 | 1268 | 6.9% |
| 1001 | 5.5% | |
| 100,000 | 1001 | 5.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
| Distinct | 4975 |
|---|---|
| Distinct (%) | 25.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 63.65938 |
| Min length | 18 |
Unique
| Unique | 4355 ? |
|---|---|
| Unique (%) | 22.1% |
Sample
| 1st row | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 |
|---|---|
| 2nd row | Cloud-based data software & APIs (AWS, GCP, Azure, etc.), -1, -1, -1, -1, 0 |
| 3rd row | -1, -1, -1, -1, -1 |
| 4th row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 |
| 5th row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 |
| Value | Count | Frequency (%) |
| 1 | 85278 | |
| etc | 14500 | 7.3% |
| local | 8475 | 4.3% |
| development | 8475 | 4.3% |
| environments | 8475 | 4.3% |
| rstudio | 8475 | 4.3% |
| jupyterlab | 8475 | 4.3% |
| software | 6025 | 3.1% |
| statistical | 3956 | 2.0% |
| excel | 3061 | 1.6% |
| Other values (2861) | 42147 |
Most occurring characters
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
How long have you been writing code to analyze data (at work or at school)?
Categorical
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4090 |
| Missing (%) | 20.7% |
| Memory size | 154.2 KiB |
| 1-2 years | |
|---|---|
| < 1 years | |
| 3-5 years | |
| 5-10 years | |
| 10-20 years | |
| Other values (2) |
Length
| Max length | 25 |
|---|---|
| Median length | 9 |
| Mean length | 10.140142 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | I have never written code |
| 3rd row | 1-2 years |
| 4th row | < 1 years |
| 5th row | 20+ years |
Common Values
| Value | Count | Frequency (%) |
| 1-2 years | 4061 | |
| < 1 years | 3828 | |
| 3-5 years | 3365 | |
| 5-10 years | 1887 | |
| 10-20 years | 1045 | 5.3% |
| I have never written code | 865 | 4.4% |
| 20+ years | 576 | 2.9% |
| (Missing) | 4090 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 14762 | |
| 1-2 | 4061 | 10.8% |
| 3828 | 10.2% | |
| 1 | 3828 | 10.2% |
| 3-5 | 3365 | 8.9% |
| 5-10 | 1887 | 5.0% |
| 10-20 | 1045 | 2.8% |
| i | 865 | 2.3% |
| have | 865 | 2.3% |
| never | 865 | 2.3% |
| Other values (3) | 2306 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
What programming language would you recommend an aspiring data scientist to learn first?
Categorical
Imbalance  Missing 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5340 |
| Missing (%) | 27.1% |
| Memory size | 154.2 KiB |
| Python | |
|---|---|
| R | |
| SQL | 817 |
| C++ | 199 |
| MATLAB | 162 |
| Other values (7) | 540 |
Length
| Max length | 10 |
|---|---|
| Median length | 6 |
| Mean length | 5.2444182 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Python |
|---|---|
| 2nd row | Python |
| 3rd row | Python |
| 4th row | Java |
| 5th row | Python |
Common Values
| Value | Count | Frequency (%) |
| Python | 11316 | |
| R | 1343 | 6.8% |
| SQL | 817 | 4.1% |
| C++ | 199 | 1.0% |
| MATLAB | 162 | 0.8% |
| C | 153 | 0.8% |
| Other | 127 | 0.6% |
| Java | 104 | 0.5% |
| None | 69 | 0.3% |
| Javascript | 47 | 0.2% |
| Other values (2) | 40 | 0.2% |
| (Missing) | 5340 |
Length
| Value | Count | Frequency (%) |
| python | 11316 | |
| r | 1343 | 9.3% |
| sql | 817 | 5.7% |
| c | 352 | 2.4% |
| matlab | 162 | 1.1% |
| other | 127 | 0.9% |
| java | 104 | 0.7% |
| none | 69 | 0.5% |
| javascript | 47 | 0.3% |
| bash | 35 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Have you ever used a TPU (tensor processing unit)?
Categorical
Imbalance  Missing 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5514 |
| Missing (%) | 28.0% |
| Memory size | 154.2 KiB |
| Never | |
|---|---|
| Once | |
| 2-5 times | 1037 |
| 6-24 times | 193 |
| > 25 times | 158 |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.3226783 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Never |
|---|---|
| 2nd row | Once |
| 3rd row | Never |
| 4th row | Never |
| 5th row | 6-24 times |
Common Values
| Value | Count | Frequency (%) |
| Never | 11495 | |
| Once | 1320 | 6.7% |
| 2-5 times | 1037 | 5.3% |
| 6-24 times | 193 | 1.0% |
| > 25 times | 158 | 0.8% |
| (Missing) | 5514 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| never | 11495 | |
| times | 1388 | 8.8% |
| once | 1320 | 8.4% |
| 2-5 | 1037 | 6.6% |
| 6-24 | 193 | 1.2% |
| 158 | 1.0% | |
| 25 | 158 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
For how many years have you used machine learning methods?
Categorical
Missing 
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5535 |
| Missing (%) | 28.1% |
| Memory size | 154.2 KiB |
| < 1 years | |
|---|---|
| 1-2 years | |
| 2-3 years | |
| 3-4 years | |
| 4-5 years | |
| Other values (3) |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 9.1086589 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | 2-3 years |
| 3rd row | < 1 years |
| 4th row | 10-15 years |
| 5th row | 2-3 years |
Common Values
| Value | Count | Frequency (%) |
| < 1 years | 5149 | |
| 1-2 years | 3798 | |
| 2-3 years | 1840 | 9.3% |
| 3-4 years | 1080 | 5.5% |
| 4-5 years | 927 | 4.7% |
| 5-10 years | 869 | 4.4% |
| 10-15 years | 336 | 1.7% |
| 20+ years | 183 | 0.9% |
| (Missing) | 5535 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 14182 | |
| 5149 | 15.4% | |
| 1 | 5149 | 15.4% |
| 1-2 | 3798 | 11.3% |
| 2-3 | 1840 | 5.5% |
| 3-4 | 1080 | 3.2% |
| 4-5 | 927 | 2.8% |
| 5-10 | 869 | 2.6% |
| 10-15 | 336 | 1.0% |
| 20 | 183 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
| Distinct | 98 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 10491 |
| Missing (%) | 53.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 485 |
|---|---|
| Median length | 364 |
| Mean length | 207.43822 |
| Min length | 5 |
Unique
| Unique | 13 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Analyze and understand data to influence product or business decisions, Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows |
|---|---|
| 2nd row | Build prototypes to explore applying machine learning to new areas, Do research that advances the state of the art of machine learning |
| 3rd row | Analyze and understand data to influence product or business decisions, Experimentation and iteration to improve existing ML models, Do research that advances the state of the art of machine learning |
| 4th row | Analyze and understand data to influence product or business decisions, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows |
| 5th row | Other |
| Value | Count | Frequency (%) |
| to | 19758 | 7.1% |
| and | 13362 | 4.8% |
| data | 13223 | 4.7% |
| build | 11895 | 4.3% |
| machine | 10688 | 3.8% |
| learning | 10688 | 3.8% |
| business | 9657 | 3.5% |
| product | 9439 | 3.4% |
| or | 9439 | 3.4% |
| that | 9273 | 3.3% |
| Other values (47) | 162326 |
Most occurring characters
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
| Distinct | 1020 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 2936 |
| Missing (%) | 14.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 512 |
|---|---|
| Median length | 412 |
| Mean length | 153.9391 |
| Min length | 4 |
Unique
| Unique | 313 ? |
|---|---|
| Unique (%) | 1.9% |
Sample
| 1st row | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
|---|---|
| 2nd row | Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) |
| 3rd row | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) |
| 4th row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other |
| 5th row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
| Value | Count | Frequency (%) |
| etc | 44329 | 14.0% |
| data | 15722 | 5.0% |
| science | 15722 | 5.0% |
| forums | 14506 | 4.6% |
| kaggle | 10751 | 3.4% |
| blog | 10751 | 3.4% |
| social | 10751 | 3.4% |
| media | 10751 | 3.4% |
| kdnuggets | 9907 | 3.1% |
| vidhya | 9907 | 3.1% |
| Other values (36) | 164129 |
Most occurring characters
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
| Distinct | 819 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 3148 |
| Missing (%) | 16.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 176 |
|---|---|
| Median length | 151 |
| Mean length | 40.252942 |
| Min length | 3 |
Unique
| Unique | 262 ? |
|---|---|
| Unique (%) | 1.6% |
Sample
| 1st row | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy |
|---|---|
| 2nd row | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy |
| 3rd row | Coursera, edX, DataCamp, University Courses (resulting in a university degree) |
| 4th row | Other |
| 5th row | None |
| Value | Count | Frequency (%) |
| kaggle | 10238 | |
| courses | 9597 | |
| university | 8956 | 10.1% |
| coursera | 8685 | 9.8% |
| i.e | 5119 | 5.8% |
| learn | 5119 | 5.8% |
| udemy | 4804 | 5.4% |
| resulting | 4478 | 5.1% |
| in | 4478 | 5.1% |
| a | 4478 | 5.1% |
| Other values (10) | 22616 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Which of the following integrated development environments (IDE's) do you use on a regular basis?
Text
Missing 
| Distinct | 853 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 5090 |
| Missing (%) | 25.8% |
| Memory size | 154.2 KiB |
Length
| Max length | 185 |
|---|---|
| Median length | 160 |
| Mean length | 64.706023 |
| Min length | 4 |
Unique
| Unique | 274 ? |
|---|---|
| Unique (%) | 1.9% |
Sample
| 1st row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder |
|---|---|
| 2nd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code |
| 3rd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) |
| 4th row | RStudio , Other |
| 5th row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text |
| Value | Count | Frequency (%) |
| 30693 | ||
| jupyter | 21608 | |
| notebooks | 10804 | 8.0% |
| etc | 10804 | 8.0% |
| jupyterlab | 10804 | 8.0% |
| visual | 9068 | 6.7% |
| studio | 9068 | 6.7% |
| code | 4534 | 3.3% |
| rstudio | 4455 | 3.3% |
| pycharm | 4224 | 3.1% |
| Other values (10) | 19424 |
Most occurring characters
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
| Distinct | 248 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 5274 |
| Missing (%) | 26.7% |
| Memory size | 154.2 KiB |
Length
| Max length | 295 |
|---|---|
| Median length | 254 |
| Mean length | 29.514851 |
| Min length | 4 |
Unique
| Unique | 100 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Microsoft Azure Notebooks |
| 3rd row | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) |
| 4th row | None |
| 5th row | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub |
| Value | Count | Frequency (%) |
| 7815 | ||
| notebooks | 7214 | |
| 5672 | 9.4% | |
| none | 5177 | 8.5% |
| kernels | 4845 | 8.0% |
| kaggle | 4845 | 8.0% |
| colab | 4551 | 7.5% |
| products | 1878 | 3.1% |
| etc | 1878 | 3.1% |
| notebook | 1878 | 3.1% |
| Other values (20) | 14831 |
Most occurring characters
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
What programming languages do you use on a regular basis?
Text
Missing 
| Distinct | 611 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 5313 |
| Missing (%) | 26.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 70 |
|---|---|
| Median length | 60 |
| Mean length | 14.848792 |
| Min length | 1 |
Unique
| Unique | 215 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | Python, R, SQL, Java, Javascript, MATLAB |
|---|---|
| 2nd row | Python, R, SQL, Bash |
| 3rd row | Python, SQL |
| 4th row | Python, R |
| 5th row | Python, R, Bash |
| Value | Count | Frequency (%) |
| python | 12841 | |
| sql | 6532 | |
| r | 4588 | 12.2% |
| c | 3928 | 10.5% |
| java | 2267 | 6.0% |
| javascript | 2174 | 5.8% |
| bash | 2037 | 5.4% |
| matlab | 1516 | 4.0% |
| other | 1148 | 3.1% |
| typescript | 389 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
| Distinct | 439 |
|---|---|
| Distinct (%) | 3.1% |
| Missing | 5464 |
| Missing (%) | 27.7% |
| Memory size | 154.2 KiB |
Length
| Max length | 141 |
|---|---|
| Median length | 130 |
| Mean length | 30.0174 |
| Min length | 4 |
Unique
| Unique | 165 ? |
|---|---|
| Unique (%) | 1.2% |
Sample
| 1st row | Matplotlib |
|---|---|
| 2nd row | Ggplot / ggplot2 , Matplotlib , Seaborn |
| 3rd row | Matplotlib , Plotly / Plotly Express , Seaborn |
| 4th row | Ggplot / ggplot2 |
| 5th row | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn |
| Value | Count | Frequency (%) |
| 24947 | ||
| matplotlib | 10516 | |
| seaborn | 6905 | 10.3% |
| plotly | 6434 | 9.6% |
| ggplot | 4182 | 6.2% |
| ggplot2 | 4182 | 6.2% |
| express | 3217 | 4.8% |
| shiny | 1244 | 1.8% |
| none | 1240 | 1.8% |
| d3.js | 1078 | 1.6% |
| Other values (6) | 3419 | 5.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Which types of specialized hardware do you use on a regular basis?
Categorical
Missing 
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5499 |
| Missing (%) | 27.9% |
| Memory size | 154.2 KiB |
| CPUs, GPUs | |
|---|---|
| CPUs | |
| None / I do not know | |
| GPUs | |
| CPUs, GPUs, TPUs | 348 |
| Other values (9) | 250 |
Length
| Max length | 23 |
|---|---|
| Median length | 20 |
| Mean length | 9.2723308 |
| Min length | 4 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | CPUs, GPUs |
|---|---|
| 2nd row | CPUs, GPUs |
| 3rd row | CPUs, GPUs |
| 4th row | CPUs, GPUs |
| 5th row | CPUs, GPUs |
Common Values
| Value | Count | Frequency (%) |
| CPUs, GPUs | 5041 | |
| CPUs | 5001 | |
| None / I do not know | 2449 | |
| GPUs | 1129 | 5.7% |
| CPUs, GPUs, TPUs | 348 | 1.8% |
| GPUs, TPUs | 82 | 0.4% |
| Other | 50 | 0.3% |
| TPUs | 30 | 0.2% |
| CPUs, TPUs | 30 | 0.2% |
| CPUs, GPUs, Other | 27 | 0.1% |
| Other values (4) | 31 | 0.2% |
| (Missing) | 5499 |
Length
| Value | Count | Frequency (%) |
| cpus | 10472 | |
| gpus | 6638 | |
| none | 2449 | 7.6% |
| 2449 | 7.6% | |
| i | 2449 | 7.6% |
| do | 2449 | 7.6% |
| not | 2449 | 7.6% |
| know | 2449 | 7.6% |
| tpus | 496 | 1.5% |
| other | 108 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
| Distinct | 684 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 5629 |
| Missing (%) | 28.5% |
| Memory size | 154.2 KiB |
Length
| Max length | 336 |
|---|---|
| Median length | 288 |
| Mean length | 101.29813 |
| Min length | 4 |
Unique
| Unique | 232 ? |
|---|---|
| Unique (%) | 1.6% |
Sample
| 1st row | Linear or Logistic Regression |
|---|---|
| 2nd row | Linear or Logistic Regression, Convolutional Neural Networks |
| 3rd row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) |
| 4th row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks |
| 5th row | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks |
| Value | Count | Frequency (%) |
| or | 18713 | 10.5% |
| networks | 14046 | 7.9% |
| neural | 12162 | 6.8% |
| linear | 10223 | 5.7% |
| logistic | 10223 | 5.7% |
| regression | 10223 | 5.7% |
| etc | 9769 | 5.5% |
| decision | 8490 | 4.8% |
| trees | 8490 | 4.8% |
| random | 8490 | 4.8% |
| Other values (20) | 67134 |
Most occurring characters
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Which categories of ML tools do you use on a regular basis?
Text
Missing 
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 5802 |
| Missing (%) | 29.4% |
| Memory size | 154.2 KiB |
Length
| Max length | 374 |
|---|---|
| Median length | 4 |
| Mean length | 44.514625 |
| Min length | 4 |
Unique
| Unique | 15 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 3rd row | None |
| 4th row | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 5th row | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| Value | Count | Frequency (%) |
| e.g | 9911 | 13.4% |
| automated | 8733 | 11.8% |
| none | 7822 | 10.6% |
| model | 3650 | 4.9% |
| selection | 3200 | 4.3% |
| auto-sklearn | 3200 | 4.3% |
| xcessiv | 3200 | 4.3% |
| data | 1800 | 2.4% |
| augmentation | 1800 | 2.4% |
| imgaug | 1800 | 2.4% |
| Other values (24) | 28741 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Which categories of computer vision methods do you use on a regular basis?
Categorical
Missing 
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 14225 |
| Missing (%) | 72.1% |
| Memory size | 154.2 KiB |
| None | |
|---|---|
| Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc), Generative Networks (GAN, VAE, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| Other values (44) |
Length
| Max length | 324 |
|---|---|
| Median length | 271 |
| Mean length | 136.52203 |
| Min length | 4 |
Unique
| Unique | 9 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
|---|---|
| 2nd row | None |
| 3rd row | General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc) |
| 4th row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
| 5th row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
Common Values
| Value | Count | Frequency (%) |
| None | 1203 | 6.1% |
| Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 560 | 2.8% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 366 | 1.9% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc), Generative Networks (GAN, VAE, etc) | 341 | 1.7% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 326 | 1.7% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 243 | 1.2% |
| Image segmentation methods (U-Net, Mask R-CNN, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 237 | 1.2% |
| General purpose image/video tools (PIL, cv2, skimage, etc) | 233 | 1.2% |
| Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 229 | 1.2% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 224 | 1.1% |
| Other values (39) | 1530 | 7.8% |
| (Missing) | 14225 |
Length
| Value | Count | Frequency (%) |
| etc | 10408 | 11.0% |
| general | 5394 | 5.7% |
| purpose | 5394 | 5.7% |
| image | 5248 | 5.5% |
| networks | 4268 | 4.5% |
| methods | 3933 | 4.2% |
| other | 3238 | 3.4% |
| and | 3187 | 3.4% |
| classification | 3187 | 3.4% |
| inception | 3187 | 3.4% |
| Other values (22) | 47148 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Which of the following natural language processing (NLP) methods do you use on a regular basis?
Categorical
Missing 
| Distinct | 28 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 16135 |
| Missing (%) | 81.8% |
| Memory size | 154.2 KiB |
| None | |
|---|---|
| Word embeddings/vectors (GLoVe, fastText, word2vec) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Transformer language models (GPT-2, BERT, XLnet, etc) | |
| Other values (23) |
Length
| Max length | 210 |
|---|---|
| Median length | 170 |
| Mean length | 74.985204 |
| Min length | 4 |
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) |
|---|---|
| 2nd row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) |
| 3rd row | Word embeddings/vectors (GLoVe, fastText, word2vec) |
| 4th row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) |
| 5th row | None |
Common Values
| Value | Count | Frequency (%) |
| None | 1027 | 5.2% |
| Word embeddings/vectors (GLoVe, fastText, word2vec) | 616 | 3.1% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | 498 | 2.5% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | 268 | 1.4% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Transformer language models (GPT-2, BERT, XLnet, etc) | 250 | 1.3% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Transformer language models (GPT-2, BERT, XLnet, etc) | 230 | 1.2% |
| Encoder-decorder models (seq2seq, vanilla transformers) | 188 | 1.0% |
| Transformer language models (GPT-2, BERT, XLnet, etc) | 115 | 0.6% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | 79 | 0.4% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe) | 76 | 0.4% |
| Other values (18) | 235 | 1.2% |
| (Missing) | 16135 |
Length
| Value | Count | Frequency (%) |
| models | 2399 | 8.6% |
| embeddings/vectors | 2115 | 7.6% |
| glove | 2115 | 7.6% |
| fasttext | 2115 | 7.6% |
| word2vec | 2115 | 7.6% |
| word | 2115 | 7.6% |
| seq2seq | 1368 | 4.9% |
| encoder-decorder | 1368 | 4.9% |
| vanilla | 1368 | 4.9% |
| transformers | 1368 | 4.9% |
| Other values (12) | 9510 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
| Distinct | 584 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 5964 |
| Missing (%) | 30.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 129 |
|---|---|
| Median length | 108 |
| Mean length | 36.025522 |
| Min length | 4 |
Unique
| Unique | 202 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Scikit-learn , TensorFlow , Keras , RandomForest |
| 3rd row | Scikit-learn , RandomForest, Xgboost , LightGBM |
| 4th row | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret |
| 5th row | Scikit-learn , TensorFlow , Keras , PyTorch |
| Value | Count | Frequency (%) |
| 23108 | ||
| scikit-learn | 9390 | |
| tensorflow | 5822 | 9.0% |
| keras | 5756 | 8.9% |
| randomforest | 4524 | 7.0% |
| xgboost | 4243 | 6.6% |
| pytorch | 3412 | 5.3% |
| lightgbm | 2166 | 3.4% |
| none | 1720 | 2.7% |
| caret | 1139 | 1.8% |
| Other values (4) | 3111 | 4.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
| Distinct | 183 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 12592 |
| Missing (%) | 63.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 189 |
|---|---|
| Median length | 170 |
| Mean length | 26.534316 |
| Min length | 4 |
Unique
| Unique | 82 ? |
|---|---|
| Unique (%) | 1.2% |
Sample
| 1st row | Microsoft Azure |
|---|---|
| 2nd row | Amazon Web Services (AWS) |
| 3rd row | Google Cloud Platform (GCP) , Amazon Web Services (AWS) , Microsoft Azure |
| 4th row | None |
| 5th row | Google Cloud Platform (GCP) |
| Value | Count | Frequency (%) |
| cloud | 3233 | |
| web | 2758 | |
| services | 2758 | |
| aws | 2758 | |
| amazon | 2758 | |
| 2621 | ||
| none | 2229 | |
| 2134 | ||
| platform | 2134 | |
| gcp | 2134 | |
| Other values (11) | 4059 |
Most occurring characters
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
| Distinct | 336 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 12617 |
| Missing (%) | 64.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 231 |
|---|---|
| Median length | 224 |
| Mean length | 27.010986 |
| Min length | 4 |
Unique
| Unique | 155 ? |
|---|---|
| Unique (%) | 2.2% |
Sample
| 1st row | Azure Virtual Machines, Azure Container Service |
|---|---|
| 2nd row | AWS Elastic Compute Cloud (EC2) |
| 3rd row | Google Compute Engine (GCE), AWS Lambda, Azure Virtual Machines |
| 4th row | None |
| 5th row | AWS Elastic Compute Cloud (EC2) |
| Value | Count | Frequency (%) |
| aws | 3281 | |
| none | 3155 | |
| 2963 | ||
| compute | 2948 | |
| cloud | 2512 | |
| engine | 2261 | 7.7% |
| elastic | 2121 | 7.2% |
| ec2 | 1810 | 6.2% |
| azure | 1233 | 4.2% |
| gce | 1138 | 3.9% |
| Other values (11) | 6001 |
Most occurring characters
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
| Distinct | 287 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 12639 |
| Missing (%) | 64.1% |
| Memory size | 154.2 KiB |
Length
| Max length | 173 |
|---|---|
| Median length | 4 |
| Mean length | 13.882311 |
| Min length | 4 |
Unique
| Unique | 143 ? |
|---|---|
| Unique (%) | 2.0% |
Sample
| 1st row | Databricks, Microsoft Analysis Services |
|---|---|
| 2nd row | AWS Elastic MapReduce |
| 3rd row | Google BigQuery, Databricks |
| 4th row | None |
| 5th row | Google Cloud Dataflow |
| Value | Count | Frequency (%) |
| none | 4133 | |
| 1881 | ||
| aws | 1641 | 10.9% |
| bigquery | 958 | 6.4% |
| cloud | 923 | 6.2% |
| databricks | 604 | 4.0% |
| redshift | 562 | 3.7% |
| dataflow | 525 | 3.5% |
| elastic | 429 | 2.9% |
| mapreduce | 429 | 2.9% |
| Other values (8) | 2914 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
| Distinct | 272 |
|---|---|
| Distinct (%) | 3.9% |
| Missing | 12667 |
| Missing (%) | 64.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 219 |
|---|---|
| Median length | 4 |
| Mean length | 16.253617 |
| Min length | 3 |
Unique
| Unique | 126 ? |
|---|---|
| Unique (%) | 1.8% |
Sample
| 1st row | Azure Machine Learning Studio |
|---|---|
| 2nd row | RapidMiner |
| 3rd row | SAS, Azure Machine Learning Studio, Google Cloud Machine Learning Engine |
| 4th row | None |
| 5th row | Google Cloud Translation |
| Value | Count | Frequency (%) |
| none | 4313 | |
| cloud | 2111 | |
| 2111 | ||
| machine | 1167 | 6.8% |
| learning | 1167 | 6.8% |
| engine | 586 | 3.4% |
| azure | 581 | 3.4% |
| studio | 581 | 3.4% |
| amazon | 569 | 3.3% |
| sagemaker | 569 | 3.3% |
| Other values (9) | 3283 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis?
Text
Missing 
| Distinct | 201 |
|---|---|
| Distinct (%) | 2.9% |
| Missing | 12702 |
| Missing (%) | 64.4% |
| Memory size | 154.2 KiB |
Length
| Max length | 153 |
|---|---|
| Median length | 4 |
| Mean length | 9.4597292 |
| Min length | 4 |
Unique
| Unique | 100 ? |
|---|---|
| Unique (%) | 1.4% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Auto-Keras |
| 3rd row | Google AutoML , Tpot , Auto-Keras , Auto-Sklearn , Auto_ml |
| 4th row | None |
| 5th row | Google AutoML |
| Value | Count | Frequency (%) |
| none | 5175 | |
| 1266 | 11.6% | |
| automl | 860 | 7.8% |
| auto-sklearn | 756 | 6.9% |
| 498 | 4.5% | |
| auto-keras | 465 | 4.2% |
| auto_ml | 279 | 2.5% |
| h20 | 277 | 2.5% |
| driverless | 277 | 2.5% |
| ai | 277 | 2.5% |
| Other values (6) | 831 | 7.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
| Distinct | 454 |
|---|---|
| Distinct (%) | 6.5% |
| Missing | 12723 |
| Missing (%) | 64.5% |
| Memory size | 154.2 KiB |
Length
| Max length | 168 |
|---|---|
| Median length | 146 |
| Mean length | 24.700743 |
| Min length | 4 |
Unique
| Unique | 196 ? |
|---|---|
| Unique (%) | 2.8% |
Sample
| 1st row | Azure SQL Database |
|---|---|
| 2nd row | PostgresSQL, AWS Relational Database Service |
| 3rd row | MySQL, PostgresSQL |
| 4th row | MySQL |
| 5th row | MySQL |
| Value | Count | Frequency (%) |
| mysql | 3122 | |
| sql | 2857 | |
| microsoft | 2399 | |
| database | 2259 | |
| postgressql | 2160 | |
| server | 1852 | |
| sqlite | 1527 | 6.5% |
| none | 1245 | 5.3% |
| oracle | 1192 | 5.1% |
| aws | 1003 | 4.3% |
| Other values (8) | 3956 |
Most occurring characters
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Correlations
| Approximately how many individuals are responsible for data science workloads at your place of business? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | Does your current employer incorporate machine learning methods into their business? | For how many years have you used machine learning methods? | Have you ever used a TPU (tensor processing unit)? | How long have you been writing code to analyze data (at work or at school)? | Select the title most similar to your current role (or most recent title if retired) | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | What is the size of the company where you are employed? | What is your age (# years)? | What is your current yearly compensation (approximate $USD)? | What is your gender? | What programming language would you recommend an aspiring data scientist to learn first? | Which categories of computer vision methods do you use on a regular basis? | Which of the following natural language processing (NLP) methods do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Approximately how many individuals are responsible for data science workloads at your place of business? | 1.000 | 0.155 | 0.244 | 0.112 | 0.034 | 0.118 | 0.107 | 0.057 | 0.302 | 0.038 | 0.108 | 0.014 | 0.025 | 0.000 | 0.055 | 0.036 |
| Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | 0.155 | 1.000 | 0.173 | 0.158 | 0.073 | 0.153 | 0.091 | 0.046 | 0.103 | 0.096 | 0.201 | 0.036 | 0.037 | 0.052 | 0.051 | 0.089 |
| Does your current employer incorporate machine learning methods into their business? | 0.244 | 0.173 | 1.000 | 0.201 | 0.052 | 0.162 | 0.161 | 0.066 | 0.120 | 0.059 | 0.141 | 0.030 | 0.041 | 0.059 | 0.090 | 0.099 |
| For how many years have you used machine learning methods? | 0.112 | 0.158 | 0.201 | 1.000 | 0.089 | 0.475 | 0.180 | 0.161 | 0.035 | 0.185 | 0.165 | 0.046 | 0.048 | 0.089 | 0.096 | 0.109 |
| Have you ever used a TPU (tensor processing unit)? | 0.034 | 0.073 | 0.052 | 0.089 | 1.000 | 0.053 | 0.039 | 0.021 | 0.024 | 0.034 | 0.055 | 0.041 | 0.036 | 0.114 | 0.109 | 0.278 |
| How long have you been writing code to analyze data (at work or at school)? | 0.118 | 0.153 | 0.162 | 0.475 | 0.053 | 1.000 | 0.188 | 0.152 | 0.058 | 0.271 | 0.205 | 0.048 | 0.066 | 0.069 | 0.078 | 0.077 |
| Select the title most similar to your current role (or most recent title if retired) | 0.107 | 0.091 | 0.161 | 0.180 | 0.039 | 0.188 | 1.000 | 0.182 | 0.050 | 0.194 | 0.069 | 0.056 | 0.075 | 0.044 | 0.062 | 0.070 |
| What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | 0.057 | 0.046 | 0.066 | 0.161 | 0.021 | 0.152 | 0.182 | 1.000 | 0.067 | 0.186 | 0.086 | 0.080 | 0.051 | 0.065 | 0.000 | 0.036 |
| What is the size of the company where you are employed? | 0.302 | 0.103 | 0.120 | 0.035 | 0.024 | 0.058 | 0.050 | 0.067 | 1.000 | 0.083 | 0.139 | 0.013 | 0.039 | 0.018 | 0.043 | 0.042 |
| What is your age (# years)? | 0.038 | 0.096 | 0.059 | 0.185 | 0.034 | 0.271 | 0.194 | 0.186 | 0.083 | 1.000 | 0.149 | 0.062 | 0.061 | 0.040 | 0.030 | 0.029 |
| What is your current yearly compensation (approximate $USD)? | 0.108 | 0.201 | 0.141 | 0.165 | 0.055 | 0.205 | 0.069 | 0.086 | 0.139 | 0.149 | 1.000 | 0.059 | 0.040 | 0.055 | 0.050 | 0.045 |
| What is your gender? | 0.014 | 0.036 | 0.030 | 0.046 | 0.041 | 0.048 | 0.056 | 0.080 | 0.013 | 0.062 | 0.059 | 1.000 | 0.053 | 0.088 | 0.041 | 0.110 |
| What programming language would you recommend an aspiring data scientist to learn first? | 0.025 | 0.037 | 0.041 | 0.048 | 0.036 | 0.066 | 0.075 | 0.051 | 0.039 | 0.061 | 0.040 | 0.053 | 1.000 | 0.060 | 0.078 | 0.063 |
| Which categories of computer vision methods do you use on a regular basis? | 0.000 | 0.052 | 0.059 | 0.089 | 0.114 | 0.069 | 0.044 | 0.065 | 0.018 | 0.040 | 0.055 | 0.088 | 0.060 | 1.000 | 0.139 | 0.174 |
| Which of the following natural language processing (NLP) methods do you use on a regular basis? | 0.055 | 0.051 | 0.090 | 0.096 | 0.109 | 0.078 | 0.062 | 0.000 | 0.043 | 0.030 | 0.050 | 0.041 | 0.078 | 0.139 | 1.000 | 0.099 |
| Which types of specialized hardware do you use on a regular basis? | 0.036 | 0.089 | 0.099 | 0.109 | 0.278 | 0.077 | 0.070 | 0.036 | 0.042 | 0.029 | 0.045 | 0.110 | 0.063 | 0.174 | 0.099 | 1.000 |
Missing values
Sample
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Select any activities that make up an important part of your role at work: | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which categories of computer vision methods do you use on a regular basis? | Which of the following natural language processing (NLP) methods do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | Which of the following cloud computing platforms do you use on a regular basis? | Which specific cloud computing products do you use on a regular basis? | Which specific big data / analytics products do you use on a regular basis? | Which of the following machine learning products do you use on a regular basis? | Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis? | Which of the following relational database products do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22-24 | Male | France | Master’s degree | Software Engineer | 1000-9,999 employees | 0 | I do not know | 30,000-39,999 | $0 (USD) | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 | 1-2 years | Python | Never | 1-2 years | NaN | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder | None | Python, R, SQL, Java, Javascript, MATLAB | Matplotlib | CPUs, GPUs | Linear or Logistic Regression | None | NaN | NaN | None | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 40-44 | Male | India | Professional degree | Software Engineer | > 10,000 employees | 20+ | We have well established ML methods (i.e., models in production for more than 2 years) | 5,000-7,499 | > $100,000 ($USD) | Cloud-based data software & APIs (AWS, GCP, Azure, etc.), -1, -1, -1, -1, 0 | I have never written code | NaN | NaN | NaN | Analyze and understand data to influence product or business decisions, Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows | Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 55-59 | Female | Germany | Professional degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 40-44 | Male | Australia | Master’s degree | Other | > 10,000 employees | 20+ | I do not know | 250,000-299,999 | $10,000-$99,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 | 1-2 years | Python | Once | 2-3 years | NaN | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) | Coursera, edX, DataCamp, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | Microsoft Azure Notebooks | Python, R, SQL, Bash | Ggplot / ggplot2 , Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Convolutional Neural Networks | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , TensorFlow , Keras , RandomForest | Microsoft Azure | Azure Virtual Machines, Azure Container Service | Databricks, Microsoft Analysis Services | Azure Machine Learning Studio | None | Azure SQL Database |
| 4 | 22-24 | Male | India | Bachelor’s degree | Other | 0-49 employees | 0 | No (we do not use ML methods) | 4,000-4,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 | < 1 years | Python | Never | < 1 years | NaN | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other | Other | Jupyter (JupyterLab, Jupyter Notebooks, etc) | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) | Python, SQL | Matplotlib , Plotly / Plotly Express , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) | None | NaN | NaN | Scikit-learn , RandomForest, Xgboost , LightGBM | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 50-54 | Male | France | Master’s degree | Data Scientist | 0-49 employees | 3-4 | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $10,000-$99,999 | Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1 | 20+ years | Java | Never | 10-15 years | Build prototypes to explore applying machine learning to new areas, Do research that advances the state of the art of machine learning | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | None | RStudio , Other | None | Python, R | Ggplot / ggplot2 | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | None | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret | Amazon Web Services (AWS) | AWS Elastic Compute Cloud (EC2) | AWS Elastic MapReduce | RapidMiner | Auto-Keras | PostgresSQL, AWS Relational Database Service |
| 6 | 22-24 | Male | India | Master’s degree | Data Scientist | 50-249 employees | 20+ | We are exploring ML methods (and may one day put a model into production) | 10,000-14,999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1 | 3-5 years | Python | 6-24 times | 2-3 years | Analyze and understand data to influence product or business decisions, Experimentation and iteration to improve existing ML models, Do research that advances the state of the art of machine learning | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc) | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub | Python, R, Bash | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc) | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | Scikit-learn , TensorFlow , Keras , PyTorch | Google Cloud Platform (GCP) , Amazon Web Services (AWS) , Microsoft Azure | Google Compute Engine (GCE), AWS Lambda, Azure Virtual Machines | Google BigQuery, Databricks | SAS, Azure Machine Learning Studio, Google Cloud Machine Learning Engine | Google AutoML , Tpot , Auto-Keras , Auto-Sklearn , Auto_ml | MySQL, PostgresSQL |
| 7 | 22-24 | Female | United States of America | Bachelor’s degree | Data Scientist | > 10,000 employees | 20+ | We recently started using ML methods (i.e., models in production for less than 2 years) | 80,000-89,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 3, -1 | 3-5 years | Python | Once | 3-4 years | Analyze and understand data to influence product or business decisions, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows | Hacker News (https://news.ycombinator.com/), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Udemy, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder | Microsoft Azure Notebooks , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc) | Python | Matplotlib , Plotly / Plotly Express | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural Networks | None | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , TensorFlow , Keras , Spark MLib | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | 22-24 | Male | United States of America | Bachelor’s degree | Student | NaN | NaN | NaN | NaN | NaN | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 4, -1 | 3-5 years | Python | Never | 1-2 years | NaN | Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , Atom | Google Colab | Python | Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Evolutionary Approaches, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks | None | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , Xgboost , PyTorch , LightGBM | NaN | NaN | NaN | NaN | NaN | NaN |
| 9 | 55-59 | Male | Netherlands | Master’s degree | Other | 0-49 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | $0-999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 5, -1 | 5-10 years | Python | Never | < 1 years | Other | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera | Jupyter (JupyterLab, Jupyter Notebooks, etc) | None | Python, SQL | Matplotlib , D3.js , Seaborn | CPUs | Linear or Logistic Regression, Bayesian Approaches, Generative Adversarial Networks | None | None | NaN | Scikit-learn , PyTorch | None | None | None | None | None | MySQL |
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Select any activities that make up an important part of your role at work: | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which categories of computer vision methods do you use on a regular basis? | Which of the following natural language processing (NLP) methods do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | Which of the following cloud computing platforms do you use on a regular basis? | Which specific cloud computing products do you use on a regular basis? | Which specific big data / analytics products do you use on a regular basis? | Which of the following machine learning products do you use on a regular basis? | Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis? | Which of the following relational database products do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19707 | 18-21 | Male | Viet Nam | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19708 | 25-29 | Female | India | Professional degree | Not employed | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), LinkedIn Learning, University Courses (resulting in a university degree) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19709 | 25-29 | Prefer not to say | Austria | No formal education past high school | Data Scientist | 250-999 employees | 1-2 | We use ML methods for generating insights (but do not put working models into production) | 1,000-1,999 | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | Analyze and understand data to influence product or business decisions | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19710 | 22-24 | Male | India | Bachelor’s degree | Data Scientist | 50-249 employees | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19711 | 18-21 | Male | India | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19712 | 50-54 | Male | Japan | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19713 | 18-21 | Male | India | Bachelor’s degree | Other | 250-999 employees | 3-4 | I do not know | $0-999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 28, -1 | 1-2 years | NaN | NaN | NaN | NaN | Reddit (r/machinelearning, r/datascience, etc) | DataCamp, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , Visual Studio / Visual Studio Code , Spyder , Notepad++ , Sublime Text | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19714 | 35-39 | Male | India | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera, Kaggle Courses (i.e. Kaggle Learn) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19715 | 25-29 | Male | India | Master’s degree | Statistician | 50-249 employees | 15-19 | We recently started using ML methods (i.e., models in production for less than 2 years) | 1,000-1,999 | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | Other | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19716 | 50-54 | Male | France | Bachelor’s degree | Software Engineer | > 10,000 employees | 20+ | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 25, -1 | 3-5 years | Python | Never | 4-5 years | Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas | Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, edX, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | IBM Watson Studio | Python, SQL, Java, Bash | Matplotlib | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune) | NaN | NaN | Scikit-learn , Spark MLib | NaN | NaN | NaN | NaN | NaN | NaN |
Duplicate rows
Most frequently occurring
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Select any activities that make up an important part of your role at work: | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which categories of computer vision methods do you use on a regular basis? | Which of the following natural language processing (NLP) methods do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | Which of the following cloud computing platforms do you use on a regular basis? | Which specific cloud computing products do you use on a regular basis? | Which specific big data / analytics products do you use on a regular basis? | Which of the following machine learning products do you use on a regular basis? | Which automated machine learning tools (or partial AutoML tools) do you use on a regular basis? | Which of the following relational database products do you use on a regular basis? | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 27 | 18-21 | Male | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 30 |
| 74 | 22-24 | Male | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 17 |
| 20 | 18-21 | Male | India | Bachelor’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12 |
| 55 | 22-24 | Male | China | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12 |
| 118 | 25-29 | Male | United States of America | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10 |
| 132 | 30-34 | Male | Japan | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10 |
| 21 | 18-21 | Male | India | Bachelor’s degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9 |
| 67 | 22-24 | Male | India | Bachelor’s degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8 |
| 53 | 22-24 | Male | China | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 |
| 99 | 25-29 | Male | China | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 |